Practical 2 - microbiota analyses

2024 NUTRIM microbiome & metabolome workshop

Author

David Barnett

Published

July 1, 2024

1 Intro to Practical 2

1.1 Instructions

Read, copy, and run each section of this walkthrough in the practical-2-notebook.qmd on posit.cloud.

Create code chunks to store and run code, and write notes alongside the code about what you are doing.

1.2 Research questions

Primary aim:

Does the bacterial gut microbiota composition of IBD-diagnosed patients differ from the control patients?

  • Diversity: Is richness or diversity associated with IBD diagnosis?

  • Composition: Does overall bacterial microbiota composition associated with IBD diagnosis?

  • Taxa: Is the relative abundance of specific bacterial taxa (e.g. genera) associated with IBD diagnosis?

1.3 Methods

  1. Diversity
    • Compare richness and diversity between groups
  2. Dissimilarity
    • (Interactively) Create ordination plots and bar charts
    • Compare overall compositions between groups
  3. Differential abundance
    • Compare relative abundance of individual taxa between groups

2 Setup

2.1 Load R packages 📦

here() starts at /Users/david/Documents/git/R-projects/teaching/workshops/2024-NUTRIM-microbiome
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.5.1     ✔ tibble    3.2.1
✔ lubridate 1.9.3     ✔ tidyr     1.3.1
✔ purrr     1.0.2     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
microViz version 0.12.4 - Copyright (C) 2021-2024 David Barnett
! Website: https://david-barnett.github.io/microViz
✔ Useful?  For citation details, run: `citation("microViz")`
✖ Silence? `suppressPackageStartupMessages(library(microViz))`
You can cite this package as:
     Patil, I. (2021). Visualizations with statistical details: The 'ggstatsplot' approach.
     Journal of Open Source Software, 6(61), 3167, doi:10.21105/joss.03167

2.2 Read phyloseq data

Read the phyloseq we created in part 1.

ps <- read_rds(file = here("data/papa2012/processed/papa12_phyloseq.rds"))

If you did not finish part 1B, you will not have created the phyloseq object.

Don’t worry, I prepared a backup copy, which you can use instead.

ps <- read_rds(file = here("data/papa2012/processed/backup/papa12_phyloseq.rds"))

You may also use this backup copy if you finished part 1B, but suspect you did something wrong.

3 Diversity

Our research questions for this section are:

  • How rich and diverse is the gut bacterial microbiota of each patient?
  • Does this gut microbiota richness or diversity differ by diagnosis group?

3.1 Richness

  • The simplest richness measure is just counting, a.k.a. “Observed Richness”.
  • We already computed the observed richness of genera in part 1B.
ps %>% 
  samdat_tbl() %>% 
  ggplot(aes(x = N_genera, y = diagnosis, color = diagnosis)) + 
  geom_boxplot(outliers = FALSE) +
  geom_jitter(height = 0.15, alpha = 0.5) +
  theme_classic()

You can use standard statistical testing on the richness values e.g. linear regression or ANOVA

richness_lm <- lm(data = samdat_tbl(ps), formula = N_genera ~ diagnosis)
anova(richness_lm)
Analysis of Variance Table

Response: N_genera
          Df  Sum Sq Mean Sq F value    Pr(>F)    
diagnosis  2  1837.3  918.63  7.8329 0.0007448 ***
Residuals 87 10203.2  117.28                      
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

You could do also standard ANOVA post-hoc pairwise comparisons.

richness_tukey <- TukeyHSD(aov(richness_lm))
richness_tukey
  Tukey multiple comparisons of means
    95% family-wise confidence level

Fit: aov(formula = richness_lm)

$diagnosis
               diff       lwr       upr     p adj
CD-Other  -5.275362 -12.81034  2.259620 0.2227571
UC-Other -10.782946 -17.36256 -4.203332 0.0005347
UC-CD     -5.507583 -12.17836  1.163197 0.1261212

Combine group comparison stats and plots in one go with ggstatsplot::ggbetweenstats()

ggbetweenstats(
  data = samdat_tbl(ps), x = diagnosis, y = N_genera, 
  type = "parametric", p.adjust.method = "fdr", var.equal = TRUE, 
  bf.message = FALSE, results.subtitle = TRUE
)

3.2 Diversity

Remember: a true measure of ecosystem diversity (e.g. Shannon index) will consider the richness and evenness of the ecosystem.

A rich ecosystem dominated by only one or two of its taxa is still a less diverse ecosystem than one with an even distribution of the same number of taxa.

We already computed Shannon diversity of genera and the effective Shannon in part 1B.

Here we will use the effective Shannon diversity \(e^H\) because of its more intuitive interpretation.

ps %>% 
  samdat_tbl() %>% 
  ggplot(aes(x = Effective_Shannon_Genus, y = diagnosis, color = diagnosis)) + 
  geom_boxplot(outliers = FALSE) +
  geom_jitter(height = 0.2) +
  theme_classic()

eShannon_lm <- lm(data = samdat_tbl(ps), formula = Effective_Shannon_Genus ~ diagnosis)
anova(eShannon_lm)
Analysis of Variance Table

Response: Effective_Shannon_Genus
          Df Sum Sq Mean Sq F value    Pr(>F)    
diagnosis  2 158.96  79.482  7.5185 0.0009731 ***
Residuals 87 919.73  10.572                      
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
eShannon_tukey <- TukeyHSD(aov(eShannon_lm))
eShannon_tukey
  Tukey multiple comparisons of means
    95% family-wise confidence level

Fit: aov(formula = eShannon_lm)

$diagnosis
              diff       lwr        upr     p adj
CD-Other -1.541564 -3.803828  0.7207006 0.2406931
UC-Other -3.170198 -5.145627 -1.1947686 0.0007080
UC-CD    -1.628634 -3.631435  0.3741666 0.1339407

You can get a tidy data frame of results using the broom::tidy function on various statistical model objects.

broom::tidy(eShannon_tukey)
# A tibble: 3 × 7
  term      contrast null.value estimate conf.low conf.high adj.p.value
  <chr>     <chr>         <dbl>    <dbl>    <dbl>     <dbl>       <dbl>
1 diagnosis CD-Other          0    -1.54    -3.80     0.721    0.241   
2 diagnosis UC-Other          0    -3.17    -5.15    -1.19     0.000708
3 diagnosis UC-CD             0    -1.63    -3.63     0.374    0.134   

This can be useful when you need to save your model output e.g. to Excel, to format for a table in your article.

broom::tidy(eShannon_tukey) %>% write_xlsx(here("practical-2/test-table.xlsx"))
ggbetweenstats(
  data = samdat_tbl(ps), x = diagnosis, y = Effective_Shannon_Genus, 
  type = "parametric", p.adjust.method = "fdr", var.equal = TRUE, 
  bf.message = FALSE, results.subtitle = TRUE
)

4 Dissimilarity & Ordination

Our research questions for this section are:

  • What gut bacterial microbiota compositions are present or common in this cohort?
  • Does the average overall composition of the gut microbiota differ by patient diagnosis group?

4.1 Aggregate ➔ Calculate ➔ Ordinate

In order to create a PCoA ordination - we need to first make two choices:

  • At which taxonomic rank will we aggregate the counts? (for 16S data, this is usually Genus)
  • Which dissimilarity measure to use when calculating the distance matrix?

4.2 Common dissimilarity measures

You heard about several commonly-used dissimilarity measures in the lecture. In the sections below, we will calculate a distance matrix with each one, and use each to plot a PCoA and test for diagnosis group differences with PERMANOVA.

An unweighted measure - based only on taxon presence or absence.

You must remember to run a “binary” transform on your data before computing “jaccard” distance.

psx_jaccard <- ps %>%
  tax_agg(rank = "Genus") %>%
  tax_transform("binary") %>% # converts counts to absence/presence: 0/1
  dist_calc(dist = "jaccard")
psx_jaccard
psExtra object - a phyloseq object with extra slots:

phyloseq-class experiment-level object
otu_table()   OTU Table:         [ 178 taxa and 90 samples ]
sample_data() Sample Data:       [ 90 samples by 20 sample variables ]
tax_table()   Taxonomy Table:    [ 178 taxa by 6 taxonomic ranks ]

otu_get(counts = TRUE)       [ 178 taxa and 90 samples ]

psExtra info:
tax_agg = "Genus" tax_trans = "binary" 

jaccard distance matrix of size 90 
0.7173913 0.5588235 0.7241379 0.5483871 0.75 ...
psx_jaccard %>% 
  ord_calc("PCoA") %>% 
  ord_plot(colour = "diagnosis") +
  stat_ellipse(aes(colour = diagnosis)) +
  coord_equal()

perm_jaccard <- psx_jaccard %>% dist_permanova(variables = "diagnosis")
perm_get(perm_jaccard)
Permutation test for adonis under reduced model
Marginal effects of terms
Permutation: free
Number of permutations: 999

vegan::adonis2(formula = formula, data = metadata, permutations = n_perms, by = by, parallel = parall)
          Df SumOfSqs      R2      F Pr(>F)    
diagnosis  2    1.970 0.08987 4.2954  0.001 ***
Residual  87   19.951 0.91013                  
Total     89   21.921 1.00000                  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Bray-Curtis is an abundance-weighted dissimilarity measure.

It is probably the most commonly used dissimilarity measure in microbiome research.

psx_bray <- ps %>%
  tax_agg(rank = "Genus") %>%
  tax_transform("identity") %>% # the "identity" transform changes nothing
  dist_calc(dist = "bray")
psx_bray
psExtra object - a phyloseq object with extra slots:

phyloseq-class experiment-level object
otu_table()   OTU Table:         [ 178 taxa and 90 samples ]
sample_data() Sample Data:       [ 90 samples by 20 sample variables ]
tax_table()   Taxonomy Table:    [ 178 taxa by 6 taxonomic ranks ]

psExtra info:
tax_agg = "Genus" tax_trans = "identity" 

bray distance matrix of size 90 
0.8294664 0.7156324 0.5111652 0.5492537 0.8991251 ...
psx_bray %>% 
  ord_calc("PCoA") %>% 
  ord_plot(colour = "diagnosis") +
  stat_ellipse(aes(colour = diagnosis)) +
  coord_equal()

perm_bray <- psx_bray %>% dist_permanova(variables = "diagnosis")
perm_get(perm_bray)
Permutation test for adonis under reduced model
Marginal effects of terms
Permutation: free
Number of permutations: 999

vegan::adonis2(formula = formula, data = metadata, permutations = n_perms, by = by, parallel = parall)
          Df SumOfSqs      R2      F Pr(>F)    
diagnosis  2   1.5068 0.05441 2.5032  0.001 ***
Residual  87  26.1846 0.94559                  
Total     89  27.6914 1.00000                  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The phylogenetic distance family - has unweighted, weighted, and generalised versions.

You must use ASV-level data (i.e. no taxonomic aggregation) and have a phylogenetic tree available.

We will not practice with UniFrac distances today, because they can be quite slow to calculate.

Cartoon illustration of phylogenetic tree from: https://www.azolifesciences.com/article/What-is-Molecular-Phylogenetics.aspx

The Aitchison distance is a CoDA distance method - named after John Aitchison, a pioneer in the field of Compositional Data analysis. related to CLR + PCA.

psx_aitchison <- ps %>%
  tax_agg(rank = "Genus") %>%
  tax_transform("identity") %>% # the "identity" transform changes nothing
  dist_calc(dist = "aitchison")
psx_aitchison
psExtra object - a phyloseq object with extra slots:

phyloseq-class experiment-level object
otu_table()   OTU Table:         [ 178 taxa and 90 samples ]
sample_data() Sample Data:       [ 90 samples by 20 sample variables ]
tax_table()   Taxonomy Table:    [ 178 taxa by 6 taxonomic ranks ]

psExtra info:
tax_agg = "Genus" tax_trans = "identity" 

aitchison distance matrix of size 90 
28.2123 18.57284 19.08465 21.43771 23.69247 ...
psx_aitchison %>% 
  ord_calc("PCoA") %>% 
  ord_plot(colour = "diagnosis") +
  stat_ellipse(aes(colour = diagnosis)) +
  coord_equal()

perm_aitchison <- psx_aitchison %>% dist_permanova(variables = "diagnosis")
perm_get(perm_aitchison)
Permutation test for adonis under reduced model
Marginal effects of terms
Permutation: free
Number of permutations: 999

vegan::adonis2(formula = formula, data = metadata, permutations = n_perms, by = by, parallel = parall)
          Df SumOfSqs      R2      F Pr(>F)    
diagnosis  2     2552 0.10114 4.8946  0.001 ***
Residual  87    22681 0.89886                  
Total     89    25233 1.00000                  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

You can see from the PERMANOVA model outputs that the p value is below 0.05. So there is good statistical evidence that the bacterial gut microbiota composition of C-section delivered infants has a different composition than vaginally delivered infants at 4 days of age.

These terms are often used interchangeably.

Strictly, all distances are dissimilarities, but not all dissimilarities are distances.

A true “distance metric” \(d\), must satisfy 3 properties:

  1. Identity of indiscernibles: For any samples \(a\) and \(b\), \(d(a, b) = 0\) if and only if \(a = b\)

  2. Symmetry: For any samples \(a\) and \(b\), \(d(a, b) = d(b, a)\)

  3. Triangle inequality: For any samples \(a\), \(b\), and \(c\), \(d(a, c) ≤ d(a, b) + d(b, c)\)

    • 3 means: “the direct path between two points must be at least as short as any detour”
    • 3 is not true for e.g. Bray-Curtis… but in practice this is very rarely problematic

microViz often creates objects of class psExtra which store info about the aggregation and transformations you perform.

  • psExtra can also store a distance matrix (and an ordination or PERMANOVA results)
  • You can extract the distance matrix with dist_get()
distances <- psx_jaccard %>% dist_get()

as.matrix(distances)[1:4, 1:4]
          099_AX    199_AX    062_BZ    194_AZ
099_AX 0.0000000 0.7173913 0.5588235 0.7241379
199_AX 0.7173913 0.0000000 0.7692308 0.8695652
062_BZ 0.5588235 0.7692308 0.0000000 0.6875000
194_AZ 0.7241379 0.8695652 0.6875000 0.0000000

Notice how the Binary Jaccard dissimilarities range between 0 (identical) and 1 (no shared genera).

range(as.matrix(distances))
[1] 0 1

Principal Co-ordinates Analysis is one kind of ordination:

  • PCoA takes a sample-sample distance matrix and finds new dimensions (a coordinate system)

  • The new dimensions are created with the aim to preserve the original distances between samples

  • It also aims to capture the majority of this distance information in the first dimensions

  • This makes it easier to visualize the patterns in your data, in 2D scatterplots 👀

For more info, see “GUSTAME”

There is helpful info about ordination methods, including PCoA, on the GUide to STatistical Analysis in Microbial Ecology website (GUSTA ME). https://sites.google.com/site/mb3gustame/dissimilarity-based-methods/principal-coordinates-analysis

This website covers a lot of other topics too, which may be interesting for you to read at a later date if you’ll work on microbiome analysis.

5 More PERMANOVA

“Permutational multivariate analysis of variance” - what does that mean?

  • Permutational - statistical significance estimates obtained by shuffling the data many times
  • Multivariate - more than one dependent/outcome variable (i.e. the pairwise distances)
  • Analysis of variance - ANOVA (statistical modelling approach)

You can adjust for covariates in PERMANOVA, and often should, depending on your study design.

Let’s fit a more complex model, adjusting for sex and age.

ps %>%
  tax_agg(rank = "Genus") %>%
  dist_calc(dist = "bray") %>%
  dist_permanova(
    variables = c("diagnosis", "gender", "age_years"),
    n_perms = 999, seed = 111
  ) %>%
  perm_get()
2024-07-01 12:37:37.541204 - Starting PERMANOVA with 999 perms with 1 processes
2024-07-01 12:37:37.906394 - Finished PERMANOVA
Permutation test for adonis under reduced model
Marginal effects of terms
Permutation: free
Number of permutations: 999

vegan::adonis2(formula = formula, data = metadata, permutations = n_perms, by = by, parallel = parall)
          Df SumOfSqs      R2      F Pr(>F)    
diagnosis  2   1.4591 0.05269 2.4103  0.001 ***
gender     1   0.1557 0.00562 0.5145  0.953    
age_years  1   0.2899 0.01047 0.9577  0.472    
Residual  85  25.7279 0.92909                  
Total     89  27.6914 1.00000                  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Use more permutations for a more precise and reliable p.value in your real work (it is slower).

Always set a seed number for reproducibility of this random permutation method!

We saw that diagnosis group is significantly associated with microbiota composition.

We probably also want to know if there are differences between each pair of diagnoses: CD, UC, Other.

From the previous ordination plot, we might hypothesise that UC shows the clearest difference.

psx_bray %>% 
  ord_calc("PCoA") %>% 
  ord_plot(colour = "diagnosis") +
  stat_ellipse(aes(colour = diagnosis)) +
  coord_equal()

There is no posthoc testing routine for PERMANOVA - so instead we will check each comparison individually.

We will use ps_filter() to exclude the samples from each diagnosis group each time.

ps %>%
  ps_filter(diagnosis != "CD") %>% 
  tax_transform(trans = "identity", rank = "Genus") %>% 
  dist_calc("bray") %>% 
  dist_permanova(variables = "diagnosis", seed = 42)
psExtra object - a phyloseq object with extra slots:

phyloseq-class experiment-level object
otu_table()   OTU Table:         [ 167 taxa and 67 samples ]
sample_data() Sample Data:       [ 67 samples by 20 sample variables ]
tax_table()   Taxonomy Table:    [ 167 taxa by 6 taxonomic ranks ]

psExtra info:
tax_agg = "Genus" tax_trans = "identity" 

bray distance matrix of size 67 
0.8294664 0.5111652 0.5492537 0.8991251 0.9725358 ...

permanova:
Permutation test for adonis under reduced model
Marginal effects of terms
Permutation: free
Number of permutations: 999

vegan::adonis2(formula = formula, data = metadata, permutations = n_perms, by = by, parallel = parall)
          Df SumOfSqs      R2      F Pr(>F)   
diagnosis  1   0.9927 0.04529 3.0832  0.005 **
Residual  65  20.9287 0.95471                 
Total     66  21.9214 1.00000                 
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
ps %>%
  ps_filter(diagnosis != "UC") %>% 
  tax_transform(trans = "identity", rank = "Genus") %>% 
  dist_calc("bray") %>% 
  dist_permanova(variables = "diagnosis", seed = 42) 
psExtra object - a phyloseq object with extra slots:

phyloseq-class experiment-level object
otu_table()   OTU Table:         [ 145 taxa and 47 samples ]
sample_data() Sample Data:       [ 47 samples by 20 sample variables ]
tax_table()   Taxonomy Table:    [ 145 taxa by 6 taxonomic ranks ]

psExtra info:
tax_agg = "Genus" tax_trans = "identity" 

bray distance matrix of size 47 
0.5969582 0.7538629 0.8122644 0.8410256 0.9814676 ...

permanova:
Permutation test for adonis under reduced model
Marginal effects of terms
Permutation: free
Number of permutations: 999

vegan::adonis2(formula = formula, data = metadata, permutations = n_perms, by = by, parallel = parall)
          Df SumOfSqs     R2      F Pr(>F)
diagnosis  1   0.2911 0.0248 1.1442  0.298
Residual  45  11.4500 0.9752              
Total     46  11.7411 1.0000              
ps %>%
  ps_filter(diagnosis != "Other") %>% 
  tax_transform(trans = "identity", rank = "Genus") %>% 
  dist_calc("bray") %>% 
  dist_permanova(variables = "diagnosis", seed = 42) 
psExtra object - a phyloseq object with extra slots:

phyloseq-class experiment-level object
otu_table()   OTU Table:         [ 158 taxa and 66 samples ]
sample_data() Sample Data:       [ 66 samples by 20 sample variables ]
tax_table()   Taxonomy Table:    [ 158 taxa by 6 taxonomic ranks ]

psExtra info:
tax_agg = "Genus" tax_trans = "identity" 

bray distance matrix of size 66 
0.7156324 0.5111652 0.5492537 0.8991251 0.8081828 ...

permanova:
Permutation test for adonis under reduced model
Marginal effects of terms
Permutation: free
Number of permutations: 999

vegan::adonis2(formula = formula, data = metadata, permutations = n_perms, by = by, parallel = parall)
          Df SumOfSqs      R2      F Pr(>F)   
diagnosis  1   0.8396 0.04031 2.6881  0.007 **
Residual  64  19.9905 0.95969                 
Total     65  20.8301 1.00000                 
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Your methodological choices matter, so you should report what you did:

  • any relevant rare taxon filtering
  • the taxonomic rank of aggregation
  • the dissimilarity measure used to compute the pairwise distances
  • any covariates included in the statistical model

It’s a good idea to decide on one or two distance measures a priori, and report both (at least in supplementary material).  The choice of distance measure can affect results and conclusions!

More details on PERMANOVA

See this excellent online book chapter by Marti Anderson.

The GUide to STatistical Analysis in Microbial Ecology website
where PERMANOVA is called NP-MANOVA (non-parametric MANOVA).

6 Interactive ordination!

microViz provides a Shiny app ord_explore() to interactively create and explore PCoA plots and other ordinations. Let’s give it a try!

To start the Shiny app, copy these lines to your Console (not the Quarto doc!)

# Note: we filter out OTUs that only appear in 1 sample, to speed up the app
ps %>%
  tax_filter(min_prevalence = 2, verbose = FALSE) %>%
  ord_explore()
Instructions / Suggestions
  1. Colour the samples using the variables in the sample data

  2. Select a few samples to view their composition on bar charts!

  3. Change some ordination options:

    • Different rank of taxonomic aggregation
    • Different distances we’ve discussed
  4. Copy the automatically generated code

    • Exit the app (click the red 🛑 button in R console!)
    • Paste and run the code to recreate the ordination plot
    • Customise the plot: change colour scheme, title, etc.
  5. Launch the app again with a different subset of the data:

    • Practice using ps_filter()
    • e.g. use the data of only the UC patients’ gut microbiota!
    • Colour points by the dominant genus?
For suggestion 5: try this code in your Console
ps %>%
  tax_filter(min_prevalence = 2, verbose = FALSE) %>%
  ps_filter(diagnosis == "UC") %>%
  # calculate dominant Genus for each sample (optional)
  ps_calc_dominant(rank = "Genus", none = "Mixed", other = "Other") %>%
  ord_explore()
Unblock popups?!

To allow the interactive function to open a new tab in your browser, you may need to unblock pop-ups for posit.cloud

If you don’t see anything after running the ord_explore command, check for messages/notifications from your browser.

Beware: some important notes on interactive analysis

There are many distances available

Feel free to try out distances we haven’t talked about, BUT:

  1. You should not use a distance that you don’t understand in your work, even if the plot looks nice! 😉
  2. A few of the distances might not work correctly…
    • They are mostly implemented in the package vegan and I haven’t tested them all
    • Errors may appear in the RStudio Console
    • You can report to me any distances that don’t work (if you’re feeling helpful! 😇)

There are several other ordination methods available

Try out PCA, principal components analysis, which does NOT use distances

We will not discuss “constrained” or “conditioned” ordinations today, but if you are interested in e.g. RDA, check the Guide to Statistical Analysis in Microbial Ecology

7 PCA

“Principal Components Analysis”

For practical purposes, PCA is quite similar to Principal Co-ordinates Analysis.

In fact, PCA produces equivalent results to PCoA with Euclidean distances.

Euclidean distances are essentially a generalization of Pythagoras’ theorem to more dimensions.

In our data every taxon is a feature, a dimension, on which we calculate Euclidean distances.

Pythagoras’ theorem:

\[c = \sqrt{a^2 + b^2}\]

Euclidean distance:

\[d\left(p, q\right) = \sqrt{\sum _{i=1}^{n_{taxa}} \left( p_{i}-q_{i}\right)^2 }\]

Distance \(d\) between samples \(p\) and \(q\), with \(n\) taxa.

7.1 Why is PCA interesting?

  • Principal components are built directly from a (linear) combination of the original features.

  • That means we know how much each taxon contributes to each PC dimension

  • We can plot this information (loadings) as arrows, alongside the sample points

Code
pca <- ps %>%
  tax_filter(min_prevalence = 2, verbose = FALSE) %>%
  tax_transform(rank = "Genus", trans = "clr", zero_replace = "halfmin") %>%
  ord_calc(method = "PCA") %>%
  ord_plot(
    alpha = 0.6, size = 2, color = "diagnosis", 
    plot_taxa = 1:6, tax_vec_length = 0.5,
    tax_lab_style = tax_lab_style(
      type = "text", max_angle = 90, aspect_ratio = 1,
      size = 3, fontface = "bold"
    ),
  ) +
  theme_classic(12) +
  coord_fixed(ratio = 1, xlim = c(-3, 3), ylim = c(-3, 3), clip = "off")

pca

Interpretation

Interestingly, samples on the right of the plot (which tend to be UC patients) seem to have relatively more Escherichia/Shigella, and maybe less Blautia, Faecalibacterium and Roseburia.

Your plot may be mirrored!

When I run this code on my laptop, the UC patients are on the right of the PCA (and the Escherichia/Shigella loading arrow too)

But on your computer, the PCA plot might be mirrored, so they would be on the left side!

This is not a problem, as the sign (plus or minus) of a PC and its loadings is arbitrary, and the interpretation does not change.

This happens on different computers if they use slightly different software to solve linear algebra problems.

In general:

The relative length and direction of an arrow indicates how much that taxon contributes to the variation on each visible PC axis, e.g. Variation in Faecalibacterium abundance contributes quite a lot to variation along the PC1 axis.

The direction allows you to infer that samples positioned towards the left of the plot will tend to have higher relative abundance of Faecalibacterium than samples on the right of the plot.

Bacteroides variation contributes to both PC1 and PC2, as indicate by its high (negative) values on both axes.

But be cautious:

We can make another kind of bar plot, using the PCA information to order our samples in a circular layout.

This can help complement our interpretation of the PCA plot loadings.

iris <- ps %>%
  tax_filter(min_prevalence = 2, verbose = FALSE) %>%
  tax_transform(rank = "Genus", trans = "clr", zero_replace = "halfmin") %>%
  ord_calc(method = "PCA") %>%
  ord_plot_iris(
    tax_level = "Genus", n_taxa = 12, other = "Other",
    anno_colour = "diagnosis",
    anno_colour_style = list(alpha = 0.6, size = 0.6, show.legend = FALSE)
  )
patchwork::wrap_plots(pca, iris, nrow = 1, guides = "collect")

7.2 Centered Log Ratio transformation:

These plots look weird! most samples bunch in the middle, with spindly projections..

Code
ps %>%
  tax_agg(rank = "Genus") %>%
  dist_calc(dist = "euclidean") %>%
  ord_calc(method = "PCoA") %>%
  ord_plot(alpha = 0.6, size = 2) +
  geom_rug(alpha = 0.1) +
  coord_equal()

Code
ps %>%
  tax_agg(rank = "Genus") %>%
  ord_calc(method = "PCA") %>%
  ord_plot(alpha = 0.6, size = 2) +
  geom_rug(alpha = 0.1) +
  coord_equal()

Why doesn’t this work?

  • These ordinations are sensitive to sparsity (double-zero problem) -> aggregate taxa (and maybe filter rare taxa)

  • Excessive emphasis on high-abundance taxa -> log transform features first

  • Sequencing data are compositional -> try the centered log ratio (CLR) transformation

Remember, “Microbiome Datasets Are Compositional: And This Is Not Optional.” Gloor et al. 2017

The sequencing data gives us relative abundances, not absolute abundances.

The total number of reads sequenced per sample is an arbitrary total.

This leads to two main types of problem:

  • Statistical issues: taxon abundances are not independent, and may appear negatively correlated

  • Interpretation caveats: e.g. if one taxon blooms, the relative abundance of all other taxa will appear to decrease, even if they did not.

These issues are theoretically worse with simpler ecosystems (fewer taxa), e.g. vaginal microbiota.

The CLR transformation is useful for compositional microbiome data.

  1. Find the geometric mean of each sample
  2. Divide abundance of each taxon in that sample by this geometric mean
  3. Take a logarithm of these ratios

Problem: log(0) is undefined. So we need to do something about all the zeroes in our OTU table

Solution: add a small amount to every value (or just every zero), before applying the log transformation.

This small value is often called a pseudo-count.

What value should we use for the pseudo-count?

  • One easy option is to just add a count of 1
  • Another popular option is to add half of the smallest observed value (from across the whole dataset)
  • In general, for zero replacement, keep it simple and record your approach

8 Differential abundance

From the PCA loadings and the bar charts below, we have some suspicions about which Genera might differ in abundance in Case vs. Controls.

We can statistically test this for each taxon. This is often called “differential abundance” (DA) testing, in the style of “differential expression” (DE) testing from the transcriptomics field.

ps %>%
  tax_transform("compositional") %>% 
  comp_barplot(
    tax_level = "Genus", n_taxa = 12, facet_by = "diagnosis", 
    label = NULL, merge_other = FALSE
  ) +
  coord_flip() +
  theme(axis.ticks.y = element_blank())
Registered S3 method overwritten by 'seriation':
  method         from 
  reorder.hclust vegan

More examples of visualizing microbiota compositions using stacked bar charts can be found here:
https://david-barnett.github.io/microViz/articles/web-only/compositions.html

8.1 Model one taxon

We will start by creating a linear regression model for one genus-level category, Escherichia/Shigella.

We will fit a model with covariates, as we did for PERMANOVA

  • We will convert the categorical variables into indicator (dummy) variables
  • We will scale the continuous covariates to 0 mean and SD 1 (z-scores)
  • You’ll see this will make our plots easier to interpret later
ps <- ps %>%
  ps_mutate(
    IBD = if_else(case_control == "Case", true = 1, false = 0),
    Female = if_else(gender == "female", true = 1, false = 0),
    Age_Z = scale(age_years, center = TRUE, scale = TRUE)
  )

We will transform the count data by first making it proportions, and then taking a base 2 logarithm, log2, after adding a pseudocount.

escherReg <- ps %>%
  tax_transform("compositional", rank = "Genus") %>%
  tax_model(
    type = "lm", rank = "Genus", taxa = "Escherichia/Shigella",
    trans = "log2", trans_args = list(zero_replace = "halfmin"),
    variables = c("IBD", "Female", "Age_Z"),
    return_psx = FALSE
  ) %>%
  pluck(1)
Modelling: Escherichia/Shigella
summary(escherReg)

Call:
`Escherichia/Shigella` ~ IBD + Female + Age_Z

Residuals:
    Min      1Q  Median      3Q     Max 
-6.2823 -2.0197 -0.2065  2.9789  8.1094 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) -12.3349     1.0546 -11.696  < 2e-16 ***
IBD           4.5143     1.0948   4.123 8.58e-05 ***
Female        0.4594     0.8614   0.533    0.595    
Age_Z        -0.2339     0.4859  -0.481    0.631    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 4.042 on 86 degrees of freedom
Multiple R-squared:  0.1864,    Adjusted R-squared:  0.1581 
F-statistic: 6.569 on 3 and 86 DF,  p-value: 0.0004734

The method we have used is borrowed from MaAsLin2 - developed by the Huttenhower lab at Harvard.

  • Note: they call the compositional transformation “Total Sum Scaling (TSS)”
  • This is quite a straightforward method, so we will stick with this for today

But, many other statistical methods have been developed for differential abundance analyses. Why?

Microbiota sequencing data are quite awkward, statistically, due to their sparseness and compositionality. Each successive method claims to handle some aspect of this awkwardness “better” than previous methods.

The aim is to have a method with adequate power to detect true associations, whilst controlling the type 1 error rate, the “false positive” reporting of associations that are not “truly” present.

Results are surprisingly inconsistent across different methods, as demonstrated recently in an analysis by Jacob Nearing et al.

So, what to do?

  1. Filter out the noise & interpret results with caution! use multiple testing corrections
  2. Try two or three methods and/or use same method as a previous study if replicating (maybe avoid LEfSe and edgeR)
  3. If your design needs it, choose a method that allows covariate adjustment or random effects (for time-series)
  4. Discuss appropriate choices for your study with us at MUMC Medical Microbiology

8.2 Now model all the taxa!

We’re not normally interested in just one taxon

It’s also hard to decide which taxonomic rank we are most interested in modelling!

  • Lower ranks like ASVs (or genera) give better resolution but also more sparsity and classification uncertainty…
  • Higher ranks e.g. orders, could also be more powerful if you think most taxa within that order will follow a similar pattern.

So now we will fit a similar model for almost every taxon* at every rank from phylum to genus

*We’ll filter out genera with a prevalence of less than 10%

We often want to filter out rare taxa before performing some kinds of analysis.

Rare taxa might sometimes be:

  1. Sequencing errors
  2. Statistically problematic
  3. Biologically irrelevant

Overall, it’s less likely that we are interested in rare taxa, and models of rare taxon abundances are more likely to be unreliable.
Reducing the the number of taxa modelled also makes the process faster and makes visualizing the results easier!

What is rare?

  • Low prevalence - taxon only detected in a small number of samples in your dataset.
  • Low abundance - relatively few reads assigned to that taxon (on average or in total)

How to pick a threshold, depends on what analysis method you are filtering for!

  • alpha diversity: do not filter
  • beta diversity: relevance of threshold depends on your distance measure
  • differential abundance: stringent filtering, prevalence >5%, >10%?

Let’s say we are not interested in Genera that occur in fewer than 10% of samples, and they have to have at least 100 reads in total across all samples.

ps_genus <- ps %>% tax_agg(rank = "Genus") %>% ps_get()

Count genera before filtering

ntaxa(ps_genus) 
[1] 178

Count genera after filtering

ps_genus %>%
  tax_filter(min_prevalence = 0.1, min_total_abundance = 100) %>%
  ntaxa() 
Proportional min_prevalence given: 0.1 --> min 9/90 samples.
[1] 56

Wow so that would remove most of our unique Genera!

What is going on? Let’s make some plots!

Code
# first make a table of summary statistics for the unique genera
psGenusStats <- tibble(
  taxon = taxa_names(ps_genus),
  prevalence = microbiome::prevalence(ps_genus),
  total_abundance = taxa_sums(ps_genus)
)

p <- psGenusStats %>%
  ggplot(aes(total_abundance, prevalence)) +
  geom_point(alpha = 0.5) +
  geom_rug(alpha = 0.1) +
  scale_x_log10(labels = scales::label_number(), name = "Total Abundance") +
  scale_y_continuous(
    labels = scales::label_percent(), breaks = scales::breaks_pretty(n = 9),
    name = "Prevalence (%)",
    sec.axis = sec_axis(
      transform = ~ . * nsamples(ps), 
      breaks = scales::breaks_pretty(n = 9),
      name = "Prevalence (N samples)"
    )
  ) +
  theme_bw()

p

So most Genera have a low prevalence, and handful have way more reads than most.

Let’s label those points to check which taxa are the big time players.

Code
p + ggrepel::geom_text_repel(
  data = function(df) filter(df, prevalence > 0.5 | total_abundance > 5000),
  mapping = aes(label = taxon), size = 2, min.segment.length = 0
)

Those taxa make sense for this dataset of gut microbiota samples.

Now let’s zoom in on the less prevalent taxa by log-transforming the y axis.
We’ll also add lines indicating the thresholds of 10% prevalence and 1000 reads abundance.

Code
psGenusStats %>%
  ggplot(aes(x = total_abundance, y = prevalence)) +
  geom_vline(xintercept = 100, color = "red", linetype = "dotted") +
  geom_hline(yintercept = 10 / 100, color = "red", linetype = "dotted") +
  geom_point(alpha = 0.5) +
  geom_rug(alpha = 0.1) +
  scale_x_log10(labels = scales::label_number(), name = "Total Abundance") +
  scale_y_log10(
    labels = scales::label_percent(), 
    breaks = scales::breaks_log(n = 9),
    name = "Prevalence (%)",
    sec.axis = sec_axis(
      transform = ~ . * nsamples(ps), 
      breaks = scales::breaks_log(n = 9),
      name = "Prevalence (N samples)"
    )
  ) +
  theme_bw()

We can break this down by phylum if we add the taxonomic table information.

Code
# don't worry about this code, just focus on the plot output
ps_genus %>%
  tax_table() %>%
  as.data.frame() %>%
  as_tibble(rownames = "taxon") %>%
  left_join(psGenusStats, by = "taxon") %>%
  add_count(Phylum, name = "phylum_count", sort = TRUE) %>%
  mutate(Phylum = factor(Phylum, levels = unique(Phylum))) %>% # to fix facet order
  mutate(Phylum = forcats::fct_lump_n(Phylum, n = 5)) %>%
  mutate(Phylum = forcats::fct_na_value_to_level(Phylum, level = "Other")) %>%
  ggplot(aes(total_abundance, prevalence)) +
  geom_vline(xintercept = 100, color = "red", linetype = "dotted") +
  geom_hline(yintercept = 10 / 100, color = "red", linetype = "dotted") +
  geom_point(alpha = 0.5, size = 1) +
  geom_rug(alpha = 0.2) +
  scale_x_log10(
    labels = scales::label_log(), 
    breaks = scales::breaks_log(n = 5),
    name = "Total Abundance"
  ) +
  scale_y_log10(
    labels = scales::label_percent(), 
    breaks = scales::breaks_log(n = 9),
    name = "Prevalence (%)",
    sec.axis = sec_axis(
      transform = ~ . * nsamples(shao19), 
      breaks = scales::breaks_log(n = 9),
      name = "Prevalence (N samples)"
    )
  ) +
  facet_wrap("Phylum") +
  theme_bw(10)

# The code for taxatree_models is quite similar to tax_model.
# tax_prepend_ranks ensures that each taxon at each rank is always unique.
psModels <- ps %>%
  tax_prepend_ranks() %>%
  tax_transform("compositional", rank = "Genus") %>%
  tax_filter(min_prevalence = 0.1, undetected = 0) %>%
  taxatree_models(
    type = lm,
    trans = "log2", trans_args = list(zero_replace = "halfmin"),
    ranks = c("Phylum", "Class", "Order", "Family", "Genus"),
    variables = c("IBD", "Female", "Age_Z")
  )
psModels
psExtra object - a phyloseq object with extra slots:

phyloseq-class experiment-level object
otu_table()   OTU Table:         [ 65 taxa and 90 samples ]
sample_data() Sample Data:       [ 90 samples by 23 sample variables ]
tax_table()   Taxonomy Table:    [ 65 taxa by 6 taxonomic ranks ]

otu_get(counts = TRUE)       [ 65 taxa and 90 samples ]

psExtra info:
tax_agg = "Genus" tax_trans = "compositional" 

taxatree_models list:
Ranks: Phylum/Class/Order/Family/Genus

Next we will get a data.frame containing the regression coefficient estimates, test statistics and corresponding p values from all these regression models.

psStats <- taxatree_models2stats(psModels)
psStats
psExtra object - a phyloseq object with extra slots:

phyloseq-class experiment-level object
otu_table()   OTU Table:         [ 65 taxa and 90 samples ]
sample_data() Sample Data:       [ 90 samples by 23 sample variables ]
tax_table()   Taxonomy Table:    [ 65 taxa by 6 taxonomic ranks ]

otu_get(counts = TRUE)       [ 65 taxa and 90 samples ]

psExtra info:
tax_agg = "Genus" tax_trans = "compositional" 

taxatree_stats dataframe:
126 taxa at 5 ranks: Phylum, Class, Order, Family, Genus 
3 terms: IBD, Female, Age_Z
# A tibble: 378 × 8
   term   taxon             rank   formula  estimate std.error statistic p.value
   <fct>  <chr>             <fct>  <chr>       <dbl>     <dbl>     <dbl>   <dbl>
 1 IBD    P: Firmicutes     Phylum `P: Fir…   -0.626     0.332    -1.89  6.25e-2
 2 Female P: Firmicutes     Phylum `P: Fir…   -0.391     0.261    -1.50  1.37e-1
 3 Age_Z  P: Firmicutes     Phylum `P: Fir…    0.151     0.147     1.02  3.09e-1
 4 IBD    P: Proteobacteria Phylum `P: Pro…    4.20      0.950     4.42  2.89e-5
 5 Female P: Proteobacteria Phylum `P: Pro…    0.778     0.747     1.04  3.01e-1
 6 Age_Z  P: Proteobacteria Phylum `P: Pro…   -0.568     0.422    -1.35  1.81e-1
 7 IBD    P: Bacteroidetes  Phylum `P: Bac…   -1.75      1.17     -1.49  1.39e-1
 8 Female P: Bacteroidetes  Phylum `P: Bac…   -0.802     0.922    -0.870 3.87e-1
 9 Age_Z  P: Bacteroidetes  Phylum `P: Bac…   -0.181     0.520    -0.348 7.29e-1
10 IBD    P: Actinobacteria Phylum `P: Act…   -1.67      0.916    -1.82  7.18e-2
# ℹ 368 more rows

We have performed a lot of statistical tests here, so it is likely that we could find some significant p-values by chance alone.

We should correct for multiple testing / control the false discovery rate or family-wise error rate.

Instead of applying these adjustment methods across all taxa models at all ranks, the default behaviour is to control the family-wise error rate per taxonomic rank.

psStats <- psStats %>% taxatree_stats_p_adjust(method = "BH", grouping = "rank")
# notice the new variable
psStats %>% taxatree_stats_get()
# A tibble: 378 × 9
# Groups:   rank [5]
   term   taxon rank  formula estimate std.error statistic p.value p.adj.BH.rank
   <fct>  <chr> <fct> <chr>      <dbl>     <dbl>     <dbl>   <dbl>         <dbl>
 1 IBD    P: F… Phyl… `P: Fi…   -0.626     0.332    -1.89  6.25e-2      0.188   
 2 Female P: F… Phyl… `P: Fi…   -0.391     0.261    -1.50  1.37e-1      0.291   
 3 Age_Z  P: F… Phyl… `P: Fi…    0.151     0.147     1.02  3.09e-1      0.496   
 4 IBD    P: P… Phyl… `P: Pr…    4.20      0.950     4.42  2.89e-5      0.000607
 5 Female P: P… Phyl… `P: Pr…    0.778     0.747     1.04  3.01e-1      0.496   
 6 Age_Z  P: P… Phyl… `P: Pr…   -0.568     0.422    -1.35  1.81e-1      0.346   
 7 IBD    P: B… Phyl… `P: Ba…   -1.75      1.17     -1.49  1.39e-1      0.291   
 8 Female P: B… Phyl… `P: Ba…   -0.802     0.922    -0.870 3.87e-1      0.496   
 9 Age_Z  P: B… Phyl… `P: Ba…   -0.181     0.520    -0.348 7.29e-1      0.729   
10 IBD    P: A… Phyl… `P: Ac…   -1.67      0.916    -1.82  7.18e-2      0.188   
# ℹ 368 more rows

taxatree_plots allows you to plot statistics from all of the taxa models onto a tree layout (e.g. point estimates and significance).

The taxon model results are organised by rank, radiating out from the central root node e.g. from Phyla around the center to Genus in the outermost ring.

taxatree_plots itself returns a list of plots, which you can arrange into one figure with the patchwork package and/or cowplot.

psStats %>%
  taxatree_plots(node_size_range = c(1, 3), sig_stat = "p.adj.BH.rank") %>%
  patchwork::wrap_plots(ncol = 2, guides = "collect")

But how do we know which taxa are which nodes? We can create a labelled grey tree with taxatree_plotkey. This labels only some of the taxa based on certain conditions that we specify.

set.seed(123) # label position
key <- psStats %>%
  taxatree_plotkey(
    taxon_renamer = function(x) stringr::str_remove(x, "[PFG]: "),
    # conditions below, for filtering taxa to be labelled
    rank == "Phylum" | rank == "Genus" & prevalence > 0.2
    # all phyla are labelled, and all genera with a prevalence of over 0.2
  )
key

You can learn how to customise these tree plots to your needs, with this extended tutorial on the microViz website

  • how to directly label taxa on the coloured plots
  • how to change the layout and style of the trees
  • how to use a different regression modelling approach

9 Extra challenge

If you have already carefully gone through everything above, you may be ready for a real challenge.

9.1 The secondary aims

Disease Activity:

Is current disease activity level associated with microbiota diversity, composition, or the relative abundance of specific taxa?

Medication:

Are IBD-related medications associated with microbiota diversity, composition, or the relative abundance of specific taxa?

9.2 Instructions?

No code is provided for this challenge - just suggestions.

Disease activity:

IBD is characterised by periods of active symptomatic disease interspersed with inactive or less symptomatic periods.

It is reasonable to hypothesise that microbiota dysbiosis might be more pronounced during periods of high activity.

Try looking at only the IBD patients (i.e. filter out the controls) to assess if disease activity is associated with microbiota diversity or composition.

Tip: A good place to start is with ps_filter and ord_explore

Medication:

Several patients recently took antibiotics, or corticosteroids or other immunosuppressive medications.

These medications are often used to manage symptoms or slow disease progression in IBD.

It is also reasonable to hypothesise that they might affect the gut microbiota composition.

Try creating a correlation heatmap to assess associations between the abundance of top genera and each medication.

Look at this tutorial article for ideas:
https://david-barnett.github.io/microViz/articles/web-only/heatmaps.html#correlation-heatmaps

Tip 1: Convert each variable to binary integers: 0 for medication not used, 1 for medication used.

Tip 2: Use a non-parametric correlation measure, and don’t include too many taxa.

10 Next

The final challenge awaits in Practical 3, which you can start after the metagenomics lecture.

david-barnett.github.io/2024-NUTRIM-microbiome/practical-3/web/practical3-instructions.html

11 Session info

sessioninfo::session_info()
─ Session info ───────────────────────────────────────────────────────────────
 setting  value
 version  R version 4.4.0 (2024-04-24)
 os       macOS Sonoma 14.5
 system   aarch64, darwin20
 ui       X11
 language (EN)
 collate  en_US.UTF-8
 ctype    en_US.UTF-8
 tz       Europe/Amsterdam
 date     2024-07-01
 pandoc   3.1.1 @ /Applications/RStudio.app/Contents/Resources/app/quarto/bin/tools/ (via rmarkdown)

─ Packages ───────────────────────────────────────────────────────────────────
 package          * version  date (UTC) lib source
 ade4               1.7-22   2023-02-06 [1] CRAN (R 4.4.0)
 ape                5.8      2024-04-11 [1] CRAN (R 4.4.0)
 backports          1.5.0    2024-05-23 [1] CRAN (R 4.4.0)
 bayestestR         0.13.2   2024-02-12 [1] RSPM (R 4.4.0)
 Biobase            2.64.0   2024-04-30 [1] Bioconductor 3.19 (R 4.4.0)
 BiocGenerics       0.50.0   2024-04-30 [1] Bioconductor 3.19 (R 4.4.0)
 biomformat         1.32.0   2024-04-30 [1] Bioconductor 3.19 (R 4.4.0)
 Biostrings         2.72.0   2024-04-30 [1] Bioconductor 3.19 (R 4.4.0)
 broom            * 1.0.6    2024-05-17 [1] CRAN (R 4.4.0)
 ca                 0.71.1   2020-01-24 [1] CRAN (R 4.4.0)
 cachem             1.1.0    2024-05-16 [1] CRAN (R 4.4.0)
 cli                3.6.3    2024-06-21 [1] CRAN (R 4.4.0)
 cluster            2.1.6    2023-12-01 [1] CRAN (R 4.4.0)
 codetools          0.2-20   2024-03-31 [1] CRAN (R 4.4.0)
 colorspace         2.1-0    2023-01-23 [1] CRAN (R 4.4.0)
 corncob            0.4.1    2024-01-10 [1] CRAN (R 4.4.0)
 correlation        0.8.5    2024-06-16 [1] RSPM (R 4.4.0)
 crayon             1.5.3    2024-06-20 [1] CRAN (R 4.4.0)
 data.table         1.15.4   2024-03-30 [1] CRAN (R 4.4.0)
 datawizard         0.11.0   2024-06-05 [1] RSPM (R 4.4.0)
 digest             0.6.36   2024-06-23 [1] CRAN (R 4.4.0)
 dplyr            * 1.1.4    2023-11-17 [1] CRAN (R 4.4.0)
 effectsize         0.8.8    2024-05-12 [1] RSPM (R 4.4.0)
 evaluate           0.24.0   2024-06-10 [1] CRAN (R 4.4.0)
 fansi              1.0.6    2023-12-08 [1] CRAN (R 4.4.0)
 farver             2.1.2    2024-05-13 [1] CRAN (R 4.4.0)
 fastmap            1.2.0    2024-05-15 [1] CRAN (R 4.4.0)
 forcats          * 1.0.0    2023-01-29 [1] CRAN (R 4.4.0)
 foreach            1.5.2    2022-02-02 [1] CRAN (R 4.4.0)
 generics           0.1.3    2022-07-05 [1] CRAN (R 4.4.0)
 GenomeInfoDb       1.40.1   2024-05-24 [1] Bioconductor 3.19 (R 4.4.0)
 GenomeInfoDbData   1.2.12   2024-05-26 [1] Bioconductor
 ggforce            0.4.2    2024-02-19 [1] CRAN (R 4.4.0)
 ggplot2          * 3.5.1    2024-04-23 [1] CRAN (R 4.4.0)
 ggraph             2.2.1    2024-03-07 [1] CRAN (R 4.4.0)
 ggrepel            0.9.5    2024-01-10 [1] CRAN (R 4.4.0)
 ggsignif           0.6.4    2022-10-13 [1] RSPM (R 4.4.0)
 ggstatsplot      * 0.12.3   2024-04-06 [1] RSPM (R 4.4.0)
 glue               1.7.0    2024-01-09 [1] CRAN (R 4.4.0)
 graphlayouts       1.1.1    2024-03-09 [1] CRAN (R 4.4.0)
 gridExtra          2.3      2017-09-09 [1] CRAN (R 4.4.0)
 gtable             0.3.5    2024-04-22 [1] CRAN (R 4.4.0)
 here             * 1.0.1    2020-12-13 [1] CRAN (R 4.4.0)
 hms                1.1.3    2023-03-21 [1] CRAN (R 4.4.0)
 htmltools          0.5.8.1  2024-04-04 [1] CRAN (R 4.4.0)
 htmlwidgets        1.6.4    2023-12-06 [1] CRAN (R 4.4.0)
 httr               1.4.7    2023-08-15 [1] CRAN (R 4.4.0)
 igraph             2.0.3    2024-03-13 [1] CRAN (R 4.4.0)
 insight            0.20.1   2024-06-11 [1] RSPM (R 4.4.0)
 IRanges            2.38.0   2024-04-30 [1] Bioconductor 3.19 (R 4.4.0)
 iterators          1.0.14   2022-02-05 [1] CRAN (R 4.4.0)
 jsonlite           1.8.8    2023-12-04 [1] CRAN (R 4.4.0)
 knitr              1.47     2024-05-29 [1] CRAN (R 4.4.0)
 labeling           0.4.3    2023-08-29 [1] CRAN (R 4.4.0)
 lattice            0.22-6   2024-03-20 [1] CRAN (R 4.4.0)
 lifecycle          1.0.4    2023-11-07 [1] CRAN (R 4.4.0)
 lubridate        * 1.9.3    2023-09-27 [1] CRAN (R 4.4.0)
 magrittr           2.0.3    2022-03-30 [1] CRAN (R 4.4.0)
 MASS               7.3-60.2 2024-04-24 [1] local
 Matrix             1.7-0    2024-03-22 [1] CRAN (R 4.4.0)
 memoise            2.0.1    2021-11-26 [1] CRAN (R 4.4.0)
 mgcv               1.9-1    2023-12-21 [1] CRAN (R 4.4.0)
 microbiome         1.26.0   2024-04-30 [1] Bioconductor 3.19 (R 4.4.0)
 microViz         * 0.12.4   2024-06-30 [1] https://david-barnett.r-universe.dev (R 4.4.0)
 multtest           2.60.0   2024-04-30 [1] Bioconductor 3.19 (R 4.4.0)
 munsell            0.5.1    2024-04-01 [1] CRAN (R 4.4.0)
 nlme               3.1-164  2023-11-27 [1] CRAN (R 4.4.0)
 paletteer          1.6.0    2024-01-21 [1] RSPM (R 4.4.0)
 parameters         0.21.7   2024-05-14 [1] RSPM (R 4.4.0)
 patchwork          1.2.0    2024-01-08 [1] RSPM (R 4.4.0)
 permute            0.9-7    2022-01-27 [1] CRAN (R 4.4.0)
 phyloseq         * 1.48.0   2024-04-30 [1] Bioconductor 3.19 (R 4.4.0)
 pillar             1.9.0    2023-03-22 [1] CRAN (R 4.4.0)
 pkgconfig          2.0.3    2019-09-22 [1] CRAN (R 4.4.0)
 plyr               1.8.9    2023-10-02 [1] CRAN (R 4.4.0)
 polyclip           1.10-6   2023-09-27 [1] CRAN (R 4.4.0)
 prismatic          1.1.2    2024-04-10 [1] RSPM (R 4.4.0)
 purrr            * 1.0.2    2023-08-10 [1] CRAN (R 4.4.0)
 R6                 2.5.1    2021-08-19 [1] CRAN (R 4.4.0)
 Rcpp               1.0.12   2024-01-09 [1] CRAN (R 4.4.0)
 readr            * 2.1.5    2024-01-10 [1] CRAN (R 4.4.0)
 registry           0.5-1    2019-03-05 [1] CRAN (R 4.4.0)
 rematch2           2.1.2    2020-05-01 [1] CRAN (R 4.4.0)
 reshape2           1.4.4    2020-04-09 [1] CRAN (R 4.4.0)
 rhdf5              2.48.0   2024-04-30 [1] Bioconductor 3.19 (R 4.4.0)
 rhdf5filters       1.16.0   2024-04-30 [1] Bioconductor 3.19 (R 4.4.0)
 Rhdf5lib           1.26.0   2024-04-30 [1] Bioconductor 3.19 (R 4.4.0)
 rlang              1.1.4    2024-06-04 [1] CRAN (R 4.4.0)
 rmarkdown          2.27     2024-05-17 [1] CRAN (R 4.4.0)
 rprojroot          2.0.4    2023-11-05 [1] CRAN (R 4.4.0)
 rstudioapi         0.16.0   2024-03-24 [1] CRAN (R 4.4.0)
 Rtsne              0.17     2023-12-07 [1] CRAN (R 4.4.0)
 S4Vectors          0.42.0   2024-04-30 [1] Bioconductor 3.19 (R 4.4.0)
 scales             1.3.0    2023-11-28 [1] CRAN (R 4.4.0)
 seriation          1.5.5    2024-04-17 [1] CRAN (R 4.4.0)
 sessioninfo        1.2.2    2021-12-06 [1] CRAN (R 4.4.0)
 statsExpressions   1.5.4    2024-03-20 [1] RSPM (R 4.4.0)
 stringi            1.8.4    2024-05-06 [1] CRAN (R 4.4.0)
 stringr          * 1.5.1    2023-11-14 [1] CRAN (R 4.4.0)
 survival           3.5-8    2024-02-14 [1] CRAN (R 4.4.0)
 tibble           * 3.2.1    2023-03-20 [1] CRAN (R 4.4.0)
 tidygraph          1.3.1    2024-01-30 [1] CRAN (R 4.4.0)
 tidyr            * 1.3.1    2024-01-24 [1] CRAN (R 4.4.0)
 tidyselect         1.2.1    2024-03-11 [1] CRAN (R 4.4.0)
 tidyverse        * 2.0.0    2023-02-22 [1] CRAN (R 4.4.0)
 timechange         0.3.0    2024-01-18 [1] CRAN (R 4.4.0)
 TSP                1.2-4    2023-04-04 [1] CRAN (R 4.4.0)
 tweenr             2.0.3    2024-02-26 [1] CRAN (R 4.4.0)
 tzdb               0.4.0    2023-05-12 [1] CRAN (R 4.4.0)
 UCSC.utils         1.0.0    2024-05-06 [1] Bioconductor 3.19 (R 4.4.0)
 utf8               1.2.4    2023-10-22 [1] CRAN (R 4.4.0)
 vctrs              0.6.5    2023-12-01 [1] CRAN (R 4.4.0)
 vegan              2.6-6.1  2024-05-21 [1] CRAN (R 4.4.0)
 viridis            0.6.5    2024-01-29 [1] CRAN (R 4.4.0)
 viridisLite        0.4.2    2023-05-02 [1] CRAN (R 4.4.0)
 withr              3.0.0    2024-01-16 [1] CRAN (R 4.4.0)
 writexl          * 1.5.0    2024-02-09 [1] CRAN (R 4.4.0)
 xfun               0.45     2024-06-16 [1] CRAN (R 4.4.0)
 XVector            0.44.0   2024-04-30 [1] Bioconductor 3.19 (R 4.4.0)
 yaml               2.3.8    2023-12-11 [1] CRAN (R 4.4.0)
 zeallot            0.1.0    2018-01-28 [1] RSPM (R 4.4.0)
 zlibbioc           1.50.0   2024-04-30 [1] Bioconductor 3.19 (R 4.4.0)

 [1] /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/library

──────────────────────────────────────────────────────────────────────────────